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Abstract 

The  overarching  objective  of  the  proposed  research  was  to  develop  an  extension  of 
the  temporal  context  model  to  enable  the  description  of  a  broad  variety  of  phenom¬ 
ena  across  various  subfields  of  cognitive  psychology  by  integrating  a  representation 
of  the  time  at  which  stimuli  were  experienced.  This  was  accomplished.  We  pub¬ 
lished  a  paper  in  Neural  Computation  that  describes  such  a  model  and  applied  it  to 
problems  in  episodic  memory,  timing  and  classical  conditioning.  A  more  detailed 
version  of  this  model,  described  in  a  paper  revised  for  Psychological  Review,  ex¬ 
tends  it  to  a  wider  range  of  phenomena  by  introducing  a  translation  operator  allow¬ 
ing  for  the  construction  of  trajectories  of  predicted  future  states,  and  jumping-back- 
in-time  to  allow  for  an  account  of  contiguity  effects  in  episodic  memory.  In  that 
paper  we  applied  the  model  to  problems  from  episodic  memory,  working  memory, 
as  well  as  second-order  conditioning  problems  from  trace  conditioning.  We  met 
the  near-term  goals  of  generating  simulation  code  for  the  model  of  timing,  with 
three  different  approaches  to  simulating  the  equations  implemented  in  the  R  pro¬ 
gram  language.  One  is  based  on  simulating  the  time  series  of  inputs  as  a  series  of 
delta  functions.  The  second  approximates  the  representation  of  history  by  imple¬ 
menting  a  partial  differential  equation  it  obeys.  The  third,  which  is  most  general 
but  subject  to  approximation  errors  explicitly  calculates  the  operator  that  inverts 
the  Laplace  transform.  The  sensitivity  to  noise  of  the  model  was  developed  in  the 
Neural  Computation  paper.  We  also  made  progress  towards  the  longer-term  goals 
of  the  proposal.  Although  not  yet  published,  we  have  analyzed  the  properties  of  the 
model  as  applied  to  semantic  memory.  Although  we  have  not  conducted  the  simu¬ 
lations,  we  successfully  secured  funding  from  the  NSF  to  pursue  that  longer-term 
goal. 


FA9550-10-1-0149  supported  the  development  of  a  quantitative  model  of  stimulus  history 
and  applied  it  to  numerous  phenomena  in  cognitive  psychology.  That  model  developed  a  mathemat¬ 
ical  formalism  in  which  a  stimulus  history  f(x)  is  encoded  by  means  of  the  Laplace  transform,  and 
an  approximate  inversion. 

The  research  proposed  several  sub-projects  grouped  roughly  into  near-term  and  medium-term 
projects.  Here  we  summarize  the  final  status  of  these  sub-projects.  After  this  brief  summary,  we 
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describe  the  scientific  content  of  the  output  of  the  research  project  in  considerably  more  detail. 
Near-term  projects 

Near-term  projects  were  anticipated  to  be  completed  in  the  first  year  of  funding.  Both  of 
these  were  completed  as  anticipated. 

1.  Development  of  a  simulation  library  for  remembered  time.  The  model  started  with  a 
mathematical  formulation.  In  order  to  facilitate  comparison  of  the  equations  to  behavioral  and 
neural  results,  we  proposed  to  develop  a  software  simulation  library  to  implement  the  equations. 
As  it  turns  out  we  developed  three  versions  of  the  library,  each  corresponding  to  a  different  way  of 
implementing  the  equations,  in  the  R  programming  language.  We  have  provided  these  libraries  to 
other  investigators  on  request  to  facilitate  widespread  use  of  the  model  in  scientific  investigation. 
We  have  also  facilitated  the  development  of  a  Python  library  based  on  our  R  code  by  the  Dennis  lab 
at  Ohio  State. 

2.  Analytic  study  of  recency  and  contiguity  effects.  The  recency  effect  and  the  contiguity 
effect  refer  to  two  fundamental  aspects  of  episodic  memory.  The  recency  effect  refers  to  the  finding 
that,  all  other  things  being  equal,  stimuli  experienced  more  recently  in  the  past  are  better  remem¬ 
bered  than  stimuli  experienced  less  recently  in  the  past  (Murdock,  1962;  Glenberg  et  al.,  1980).  The 
contiguity  effect  refers  to  the  finding  that,  all  other  things  being  equal,  when  one  stimulus  comes 
to  mind,  the  next  stimulus  that  comes  to  mind  is  likely  to  have  been  experienced  close  in  time  to 
the  first  stimulus  (Kahana,  1996).  Both  of  these  phenomena  have  been  observed  over  a  variety 
of  time  scales  (Glenberg  et  al.,  1980;  Howard  &  Kahana,  1999;  Howard,  Youker,  &  Venkatadass, 
2008;  Unsworth,  2008)  suggesting  a  common  mechanism  that  is  not  based  on  traditional  ideas  of 
short-term  memory. 

We  proposed  to  analyze  these  findings  in  the  context  of  our  model  of  remembered  time.  The 
basic  idea  is  that  the  current  history  serves  as  a  cue  to  initiate  recall,  leading  to  the  recency  effect. 
Contiguity  was  hypothesized  to  happen  when  a  stimulus  is  remembered  causes  recovery  of  the  state 
of  history  when  it  was  initially  experienced.  Because  the  representation  of  history  is  scale-free,  it 
should  account  for  both  findings  over  a  variety  of  time  scales. 

The  model’s  treatment  of  the  recency  effect  was  first  addressed  in  Shankar  and  Howard 
(2012).  Recency  worked  as  expected.  The  account  of  contiguity  effect  was  slightly  more  involved 
and  was  presented  in  Howard,  Shankar,  Aue,  and  Criss  (In  preparation).  Interestingly,  the  model 
ends  up  predicting  violations  of  strict  scale-invariance  in  the  contiguity  effect  that  appeal-  consis¬ 
tent  with  experimentally-observed  findings.  Because  of  the  centrality  of  contiguity  in  theories  of 
episodic  memory,  we  were  pleased  to  contribute  to  two  unanticipated  side  projects  that  extended 
this  goal  beyond  our  initial  expectations.  One  was  completion  of  revisions  of  a  paper  confirming 
neurophysiological  predictions  of  the  account  of  contiguity  based  on  recovery  of  temporal  infor¬ 
mation  (Howard,  Viskontas,  Shankar,  &  Fried,  In  press).  In  that  paper,  the  pattern  of  activity  in 
ensembles  of  neurons  in  the  human  MTL  showed  evidence  for  a  “jump  back  in  time”  as  we  had 
anticipated.  In  a  second  paper,  we  helped  design  and  analyze  an  empirical  study  that  developed  a 
new  experimental  methodology  for  measuring  the  contiguity  effect  behaviorally  in  human  subjects 
(Kilic,  Criss,  &  Howard.  In  press).  Because  that  study  established  a  causal  relationship  between  the 
presentation  of  the  cue  and  the  recovered  memories,  it  cannot  be  accounted  for  by  several  competing 
explanations  of  the  contiguity  effect  (Davelaar,  Goshen-Gottstein,  Ashkenazi,  Haarmann,  &  Usher, 
2005;  Grossberg  &  Pearson,  2008;  Farrell,  2012). 
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3.  Sensitivity  to  noise.  This  subproject  was  designed  to  evaluate  the  model’s  resistance  to 
stochastic  noise  in  its  inputs.  This  is  an  important  topic  to  investigate  because  if  the  model  is  too 
sensitive  to  disruptions,  it  would  be  unable  to  represent  stimulus  history  over  the  long  periods  of 
time  necessary  to  account  for  the  behavioral  phenomena  under  investigation.  These  analyses  were 
conducted  and  reported  in  detail  in  Shankar  and  Howard  (2012).  Briefly,  the  representation  of 
stimulus  history  is  surprisingly  robust  to  stochastic  disruptions  in  the  input  patterns.  We  further 
examined  the  detailed  response  of  the  model  to  signals  with  various  spectral  signatures  in  Shankar 
and  Howard  (submitted). 

Medium-term  projects 

Medium-term  projects  were  ones  that  would  not  be  likely  to  be  completed  within  a  single 
year.  We  were  hopeful  to  complete  these  within  the  two  year  window.  As  it  turned  out,  we  success¬ 
fully  completed  one  of  these  subprojects  and  made  strong  progress  on  the  second. 

Temporal  mapping  in  trace  conditioning.  Previous  work  in  the  mathematical  modeling  of 
memory  focused  either  on  human  memory  experiments  in  which  subjects  learn  lists  of  words,  or 
on  conditioning  experiments  in  which  rats  learn  to  respond  appropriately  to  stimuli  presented  on 
various  schedules.  This  parochial  approach  seems  to  us  to  be  poorly  suited  to  the  goal  of  devel¬ 
oping  a  general  theory  of  memory  that  applies  across  species.  With  this  in  mind,  we  proposed  to 
describe  phenomena  from  the  temporal  mapping  literature  (e.g..  Cole,  Barnet,  &  Miller,  1995),  a 
conditioning  paradigm  in  which  rats  learn  to  integrate  temporal  relationships  between  stimuli  (see 
below  for  more  detail). 

We  describe  our  treatment  of  temporal  mapping  in  some  detail  below  and  in  considerably 
more  detail  in  Howard  et  al.  (In  preparation).  Briefly,  the  solution  turned  out  to  require  major 
advances  in  the  mathematical  framework.  First,  as  anticipated,  we  required  the  same  “jump  back 
in  time”  that  we  used  to  account  for  the  contiguity  effect  in  episodic  memory  studies,  suggesting  a 
deep  analogy  between  human  list  learning  and  animal  classical  conditioning  studies.  Second,  we 
needed  to  construct  a  “translation  operator”  that  enables  the  model  to  “play  forward”  anticipated 
future  states  of  the  world  using  the  current  state  of  history.  This  translation  operator  (described 
below)  also,  fortuitiously  enabled  an  account  of  other  behavioral  phenomena,  namely  interval  timing 
across  scales  and  the  “time-left  procedure”  from  animal  conditioning  studies  (see  Howard  et  al..  In 
preparation,  for  details). 

Modeling  the  development  of  semantic  representations  in  higher-order  statistical  environ¬ 
ments.  Episodic  memory  refers  to  the  ability  to  remember  specific  instances  from  one’s  life. 
Episodic  memories  are  rich  in  detail,  as  if  the  rememberer  is  reexperiencing  the  event.  In  contrast 
semantic  memory  reflects  general  verbalizable  knowledge  that  does  not  depend  on  reexperiencing 
the  specific  learning  event.  For  instance,  the  question  “what  did  you  have  for  breakfast”  usually 
evokes  an  episodic  memory  in  which  one  in  some  sense  relives  their  most  recent  breakfast  as  paid 
of  answering  the  question.  In  contrast,  “what  is  the  capital  of  Vermont”  does  not  usually  evoke  a 
vivid  recollection  of  the  moment  when  that  fact  was  learned.  We  extensively  applied  the  mathe¬ 
matical  framework  to  phenomena  from  episodic  memory  experiments.  The  goal  of  this  subproject 
was  to  extend  the  formalism  to  account  for  semantic  memory  phenomena  as  well.  Although  not  yet 
published,  we  have  solved  several  problems  that  will  be  necessary  for  describing  semantic  memory. 

The  basic  idea  is  that  meaning  in  semantic  memory  is  developed  largely  by  learning  the 
temporal  roles  played  by  different  stimuli.  For  instance,  consider  the  novel  word  FLOOB.  Prior  to 
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experience,  there  is  no  information  about  the  meaning  of  FLOOB.  However,  after  a  single  exposure 
in  the  sentence  “The  baker  reached  into  the  oven  and  pulled  out  the  FLOOB,”  a  tremendous  amount 
of  information  about  the  meaning  of  FLOOB  is  available  to  the  learner.  The  basic  hypothesis  is  that 
we  can  use  the  current  state  of  temporal  history  to  generate  a  prediction.  This  prediction  can  be 
used  to  inform  the  meaning  of  stimuli  that  arc  experienced  (Shankar.  Jagadisan,  &  Howard,  2009; 
Howard.  Shankar,  &  Jagadisan,  2011). 

Previous  work  used  this  general  approach  to  learn  from  a  much  simpler  representation  of 
temporal  context.  Our  long-term  goal  is  to  utilize  the  richer  representation  of  temporal  history  we 
developed  to  fulfill  the  same  role.  Previously  we  established  that  the  version  of  the  model  based 
on  temporal  context  can  learn  perfectly  any  language  that  can  be  described  by  bigram  transition 
probabilities  (Shankar  et  al.,  2009).  That  is  all  models  in  which  the  identity  of  the  next  symbol  can 
be  deduced  from  only  the  prior  symbol.  Unpublished  analyses  have  shown  that  the  expanded  model 
should  be  able  to  account  for  lagged  bigram  languages.  That  is,  languages  in  which  the  next  symbol 
can  be  deduced  from  independently  combining  the  identity  of  the  previous  symbol  and  the  identity 
of  the  symbol  before  that  and  so  on.  This  limitation  is  actually  a  huge  advantage  over  trying  to 
learn  N-gram  langauges  which  would  require  an  astronomical  number  of  observations  to  learn  the 
probability  of  non-independent  combinations  of  symbols.  Moreover,  the  lags  become  less  precise 
the  further  in  the  past  one  is  considering.  That  is,  the  model  distinguishes  the  symbol  at  lag  1  and  the 
symbol  at  lag  2  with  more  temporal  resolution  than  it  does  the  symbol  at  lag  1 1  and  the  symbol  at  lag 
12.  Subject  to  this  constraint,  the  model  can  capture  long-range  correlations  among  symbols.  We 
suspect  that,  because  language  includes  constraints  over  a  wide  variety  of  time  scales,  this  property 
constitutes  an  important  advance  over  existing  computational  models  of  semantic  memory  (e.g., 
Landauer  &  Dumais,  1997;  Griffiths,  Steyvers,  &  Tenenbaum,  2007). 

As  a  practical  implementation  matter,  we  have  determined  tha  the  generalization  from  tem¬ 
poral  context  to  temporal  history  is  not  a  major  obstacle  because  of  the  fact  that  history  can  be 
composed  from  states  of  temporal  context  and  the  lineality  of  the  operator  making  that  construc¬ 
tion.  The  major  open  problem  to  be  solved  in  describing  language  is  polysemy.  In  practice,  lan¬ 
guage  cannot  simply  be  described  as  a  lagged  bigram  language.  The  problem  is  that  polysemous 
symbols — words  with  multiple  meanings — play  different  roles  in  different  contexts.  This  prevents 
the  statistics  from  being  expressed  as  a  lagged  bigram  language:  if  what  follows  from  the  word 
FLY  depends  on  whether  it’s  in  a  discussion  of  baseball  or  a  discussion  of  insects,  this  prevents  us 
from  treating  the  effect  of  that  symbol  independently  of  the  other  symbols  nearby.  The  solution  is 
to  generate  a  new  set  of  symbols  in  which  polysemous  meanings  arc  distinguished  by  their  context. 
This  amounts  to  a  blind  source  separation  problem,  which  is  relatively  well-understood.  We  have 
secured  funding  from  NSF  (BCS-1058937)  to  pursue  this  line  of  research  and  construct  a  large-scale 
computational  model  from  naturally-occuring  text. 

Detailed  scientific  description  of  results 

Taken  as  a  whole,  the  set  of  behavioral  findings  accommodated  in  this  mathematical  frame¬ 
work  reflect  a  major  advance  towards  a  satisfactory  theory  of  memory.  Here  we  describe  the  model 
and  the  results  in  more  detail  so  that  the  interested  reader  can  appreciate  these  deep  connections 
between  fields.  By  means  of  introduction,  we  first  briefly  recap  the  formalism  of  the  model  and  the 
behavioral  simulations.  These  arc  described  in  more  detail  elsewhere  (Shankar  &  Howard,  2012; 
Howard  et  al..  In  preparation). 
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Figure  1.  Schematic  illustrating  the  construction  of  T  from  t  and  the  role  of  M.  a.  In  this  schematic, 
the  stimulus  dimensions  (columns)  have  been  ordered  with  the  sequence  in  which  the  stimuli  were  presented. 
Dark  values  correspond  to  higher  levels  of  activation.  Different  rows  correspond  to  different  values  of  s.  Each 
row  of  t  represents  the  stimuli  as  an  exponential  function.  The  operator  Lj.1  takes  t  into  T  at  each  moment. 
Stimuli  presented  in  the  recent  past  have  a  concise  representation;  stimuli  further  in  the  past  have  overlapping 
representations.  For  visual  clarity,  values  in  T  have  been  scaled  to  their  maximum  value,  b.  At  each  moment, 
the  current  value  of  f  is  used  to  update  the  current  value  of  t.  At  each  moment,  the  current  value  of  t  is  used 
to  update  T,  which  is,  in  turn  used  to  update  M.  The  current  value  of  M  and  the  current  state  T  are  used  to 
generate  a  prediction  p. 


Representing  the  past 

Given  a  stimulus  history  f(x)  leading  up  to  the  present  moment,  the  goal  is  to  represent  that 

function  using  only  operations  local  in  time.  This  can  be  accomplished  by  using  a  set  of  leaky 

integrators  t(s)  that  each  obey  the  differential  equation 

£  =  -.«+f(T),  (1) 

where  x  is  physical  time  The  solution  to  Eq.  1  is  just  the  Laplace  transform  of  f(Y  <  x): 

t(x)  =  r  e-^-x\x')  dz’  (2) 

J  —OO 

This  means  that  t(.v)  contains  all  of  the  information  in  the  stimulus  up  to  that  point.  Armed  with 
this  insight,  we  can  construct  an  approximation  of  the  inversion  of  the  Laplace  transform.  It  turns 
out  that  a  method  attributable  to  Post  (1930)  has  just  the  properties  we  want.  Referring  to  the 
approximation  operator  for  integer  k  as  Lj,1 ,  we  write 

T(x)  =  LJ.1  t(j)  (3) 

=  ^/+1  t «(.),  (4) 

where  t®(s)  is  the  kth  derivative  of  t  with  respect  to  5.  The  approximation  approaches  the  inverse 
Laplace  transform  as  k  — ►  °o.  T  (x)  approximates  the  stimulus  history  as  a  function  of  internal  time  x. 
Ligure  la  provides  a  schematic  summary  of  the  construction  of  T  (x)  from  t(s)  at  any  given  moment. 

Predicting  the  future 

At  each  moment  in  time,  we  cannot  only  remember  the  past,  but  use  history  to  predict  the 
stimulus  that  will  be  presented  at  the  next  moment  as  well.  This  can  be  accomplished  by  comparing 
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the  current  state  of  T  to  the  previous  state  of  T  in  which  the  stimulus  was  experienced.  Following 
TCM  (Howard  &  Kahana,  2002;  Howard,  Fotedar,  Datey,  &  Hasselmo,  2005;  Sederberg,  Howard, 
&  Kahana,  2008),  we  store  a  simple  outer  product  between  the  currently-experienced  stimulus  and 
the  current  state  of  T  at  each  moment  and  store  it  in  a  tensor  M: 

^  =  |f(x)){T(x)|.  (5) 

The  statistics  of  the  environment  arc  stored  in  M  (see  also  Figure  lb).  Each  stimulus  is  encoded  in 
the  superposition  of  the  states  of  temporal  history  in  which  it  was  encoded.  Again  following  TCM, 
we  can  then  use  the  current  state  of  the  temporal  history  as  a  probe  to  predict  what  stimulus  will  be 
presented  at  the  next  time  step: 

|p(x))=M|T(x)>,  (6) 

where  p  is  referred  to  as  the  prediction  vector.  We  can  combine  Eqs.  5  and  6  to  observe  that  at  each 

time  step,  each  stimulus  is  predicted  to  the  extent  that  it  overlaps  with  its  encoding  context. 

% 

It  should  be  noted  that  the  summation  on  the  rhs  of  Eq.  6  is  weighted  by  g(x)  the  number 

density  of  nodes  representing  a  particular  value  of  x.  If  N  indexes  the  number  of  the  cell,  g(x)  =  — . 

dx 

More  explicitly, 

Ip)  =  /°M.g(T)  |T.>,  (7) 

J  — oo 

where  the  subscript  notation  M  refers  to  the  matrix  within  the  three-tensor  M  corresponding  to  a 

* 

speclic  value  of  x.  Similarly,  |T*)  refers  to  the  vector  within  T  corresponding  to  a  particular  value 
* 
x. 


Predicting  the  future  using  backward  replay.  It  would  be  extraordinarily  useful  if  we  were 
able  to  predict  not  only  the  immediately  preceding  moment,  but  also  to  generate  a  trajectory  of 
future  states  leading  from  the  present.  There  are  many  ways  that  one  could  conceivably  implement  a 
representation  like  T.  One  of  the  major  advantages  of  the  approach  we’ve  developed  in  the  previous 
program  period  is  that  it  provides  a  concise  and  physical  mechanism  for  calculating  such  trajectories 
into  the  future. 

Note  that  we  can  write  a  discrete  version  of  the  differential  equation  encoding  the  Laplace 
transform  (Eq.  1) 

tT+i(s)  =  RtT(s)  +  fx.  (8) 

Here  the  operator  R  (in  analogy  with  p  in  TCM)  is  just  a  diagonal  matrix  with  p  =  e~s  on  the  row 
corresponding  to  each  value  of  s  in  the  sheet  t  (5) . 

In  order  to  construct  an  estimate  of  what  stimulus  will  be  available  8  steps  in  the  future,  we 
need  to  estimate  the  state  of  T  that  will  obtain  at  that  time.  That  is,  we  can  estimate  p(x  +  8)  by 
operating  on  T(x  +  8)  with  M.  The  difficulty  is  obtaining  T(x  +  8)  at  time  x  without  waiting  8 
additional  time  steps.  As  it  turns  out,  this  can  be  accomplished  by  simply  altering  the  weights  in 
Lk* .  To  see  how  this  is  possible  note  that 


T(x  +  8,  x) 


Lk'  t(x  +  8,  s ) 


R°  t(x,  5 
t(x,  s 


Lk  R5 


(9) 

(10) 


(11) 
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Figure  2.  Schematic  illustrating  important  features  necessary  for  predicting  trajectories  forward  in  time.  a. 
Schematic  shows  t(s)  for  an  idealized  situation  when  the  columns  correspond  to  distinct  stimuli  in  the  order 
in  which  they  were  presented.  Note  the  exponential  gradient.  The  operator  that  approximates  inversion  of  the 
Laplace  transform,  Ljj.1  is  multiplied  by  R8.  As  a  result  T  is  shifted  to  approximate  where  those  components 
would  be  8  steps  in  the  future.  Compare  to  Figure  la.  b.  Characterization  of  the  change  to  the  weights  of 
Lj,1.  The  center  represents  the  weight  along  the  diagonal.  Flanking  values  give  the  weights  of  the  adjacent 
positions  within  a  row.  Dark  curve:  Lj,1 .  Lighter  curve:  R8Lk' .  When  operated  on  by  R8,  the  weights  are 
shifted  along  the  diagonal  and  reduced  in  magnitude  (not  to  scale). 


That  is,  we  can  understand  R8  as  operating  on  t(t..v)  as  in  Eq.  10,  or  we  can  understand  it  as  being 
absorbed  into  the  weights  of  Lj.1  as  in  Eq.  11.  That  is,  because  T  is  constructed  from  t ( .v )  using 
known  weights,  and  because  t(s)  evolves  deterministically,  it  is  possible  to  infer  the  T  that  would 
obtain  in  the  future  and  express  it  simply  by  transiently  changing  the  weights  of  Ljj,1  appropriately. 


A  principled  form  for  g(x) .  In  the  initial  publications  describing  T,  we  kept  g(x)  in  Eq.  7  as 

* 

a  general  function.  However,  it  turns  out  that  there  is  a  principled  choice  for  g(x).  Consider  the 

* 

distribution  across  t  caused  by  a  more  recent  stimulus  and  by  a  less  recent  stimulus.  For  the  less 
recent  stimulus,  the  function  will  be  more  broad  about  the  peak.  It  does  not  make  sense  to  devote  as 
many  cells  to  the  broad  peak  as  to  the  less  broad  peak.  We  did  a  calculation  in  which  we  determined 
the  function  g(x)  that  equalizes  the  information  per  cell.  This  turns  out  to  be  g(x)  =  1.  Interestingly, 

X 

this  implies  that  the  cells  are  logarithmically  distributed.  That  is,  internal,  retrospective  time  obeys 
the  Weber-Fechner  law.  Simultaneously,  timing  going  forward  (as  in,  say,  a  temporal  bisection 
task),  obeys  the  scalar  property,  with  a  linear  relationship  between  predicted  time  and  physical  time. 

This  is  possible  because  forward-going  timing  is  governed  by  the  match  of  an  entire  T  summed 

* 

over  x.  Both  the  encoding  T  and  the  retrieval  T  are  logarithmically  distributed,  so  scalar  timing 

is  observed  (see  Howard  et  al.,  In  preparation,  for  details).  Subsequent  work  has  shown  that  this 
*  * 
choice  for  g(x)  also  equalizes  the  mutual  information  between  adjacent  cells  within  T(x)  (Shankar 

&  Howard,  submitted). 

Behavioral  applications 

The  formalism  described  above  provides  a  framework  for  remembering  a  history  leading  up 
to  the  present  moment  and  generating  a  future  trajectory.  Here  we  summarize  some  highlights  of 
the  behavioral  applications  of  the  model  we  developed  in  the  preceding  period. 
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Figure  3.  Hacker  (1980)  gave  subjects  a  rapidly-presented  list  of  letters  followed  by  a  choice  between  two 
probe  letters.  Subjects  were  instructed  to  choose  the  more  recent  probe.  Reaction  times  for  correct  responses, 
responses  where  the  subject  chooses  the  more  recent  probe,  depend  very  much  on  the  recency  of  the  more 
recent  probe  but  are  essentially  independent  of  the  recency  of  the  less  recent  probe.  In  contrast,  error  RTs, 
in  which  the  subject  selects  the  less  recent  probe,  depend  very  much  on  the  recency  of  the  less  recent  probe 
but  are  essentially  unaffected  by  the  recency  of  the  more  recent  probe.  Top:  data.  Bottom:  Model.  This  is  a 
one-parameter  fit  of  the  model. 


Episodic  memory  across  scales.  Having  a  representation  of  history  enables  a  description 
of  previously  puzzling  results.  It  also  allows  for  an  account  of  phenomena  across  scales  that  had 
previously  been  attributed  to  separate  memory  stores  operating  at  different  time  scales.  We  have 
applied  the  model  extensively  to  two  episodic  memory  tasks,  free  recall  and  the  judgment  of  re¬ 
cency  task.  In  free  recall,  the  model  generates  scale-free  recency  and  contiguity  (Howard  et  al.,  In 
preparation).  While  scale-free  recency  has  been  assumed  by  other  models  (e.g.,  Brown,  Neath,  & 
Chater,  2007),  we  provided  a  principled  process  model  that  generates  it  as  an  emergent  property  of 
the  equations  given  above.  Scale-invariant  recency  at  the  behavioral  level  is  a  natural  consequence 
of  scale-invariance  at  the  level  of  the  representation.  Scale-persistent  contiguity  is  accomplished 
in  our  framework  by  assuming  that  there  can  be  a  recovery  of  the  t(s)  that  obtained  at  a  previous 
moment  in  time.  This  in  turn  enables  recovery  of  the  state  of  T  that  obtained  at  that  time.  Unlike 
previous  attempts  using  TCM  which  have  a  scale  fixed  by  the  specific  value  of  p  used,  this  con¬ 
tiguity  effect  persists  across  arbitrarily  large  scales.  It  is  not  precisely  scale-invariant  because  the 
asymmetry  observed  in  the  contiguity  effect  decreases  as  the  scale  is  increased.  This  appears  to  be 
consistent  with  experimental  data  using  final  free  recall  (Unsworth,  2008). 

The  major  difference  between  T(x)  and  previous  models  of  temporal  context  is  the  fact  that 
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Yntema  &  Trask  (1963) 
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Figure  4.  Yntema  and  Trask  (1963)  gave  subjects  a  continuous  relative  judgment  of  recency  task.  The  figure 
shows  the  probability  of  choosing  a  probe  item  as  more  recent  as  a  function  of  its  recency  and  the  recency 
of  the  other  probe  item.  Model  predictions  use  the  same  scanning  model  as  used  in  Figure  3.  The  model  has 
one  free  parameter. 


T(x)  retains  temporal  information  about  when  the  preceding  stimuli  were  presented.  In  order  to 
illustrate  the  advantage  of  this  approach,  we  decided  to  model  the  judgment  of  recency  (JoR)  task. 
In  the  relative  JoR  task,  subjects  are  given  two  probe  stimuli  and  asked  to  choose  the  one  that  was 
presented  more  recently.  With  fast  presentation  of  short  lists,  the  JoR  task  generates  results  that  have 
generally  been  taken  as  evidence  for  serial  scanning.  The  data  appear  as  if  the  subject  examines  the 
historical  record  in  their  memory  moment-by-moment,  stopping  when  they  find  either  of  the  probes 
(Figure  3,  Hacker,  1980;  McElree  &  Dosher,  1993;  Muter,  1979).  We  were  able  to  account  for 

these  findings  (Howard  et  al.,  In  preparation)  simply  by  assuming  that  the  subject  examines  T(x) 

* 

at  successive  values  of  x  starting  at  zero  and  extending  backwards  in  time.  At  each  moment,  the 
subject  detects  a  memory  for  a  probe  stimulus  with  probability  proportional  to  the  value  of  T  in 
the  appropriate  stimulus  column.  When  the  subject  detects  a  memory  for  one  of  the  probes,  she 
chooses  it;  if  the  scan  goes  on  for  long  enough  without  detecting  a  memory,  the  subject  guesses. 
This  extremely  simple  model  is  able  to  account  for  the  qualitative  pattern  of  results  in  accuracy, 
mean  correct  RT  and  error  RT  (Figure  3). 

Most  scholars  of  memory  have  treated  the  Hacker  (1980)  results  as  evidence  for  a  dedicated 
short-term  store  that  is  contrasted  with  long-term  memory.  However,  we  showed  that  the  simple 
model  described  above  also  accounts  for  several  phenomena  from  what  are  usually  thought  of  as 
long-term  JoR  tasks.  Figure  4  shows  that  the  same  model  provides  an  excellent  qualitative  descrip¬ 
tion  of  the  experimental  results  of  Yntema  and  Trask  (1963),  who  performed  a  continuous  relative 
JoR  task  with  recencies  chosen  from  a  very  broad  temporal  range. 

In  addition,  we  provided  a  principled  account  of  several  other  findings  in  what  are  usually 
thought  of  as  long  term  JoR  tasks,  including  the  logarithmic  increase  in  absolute  JoR’s,  in  which 
subjects  provide  a  numerical  estimate  for  how  many  steps  in  the  past  a  stimulus  was  presented 
(Hinrichs  &  Buschke,  1968;  Hinrichs,  1970)  and  the  approximate  independence  of  separate  judg¬ 
ments  based  on  distinct  presentations  of  an  item  (Hintzman,  2010).  The  former  follows  from  the 
choice  of  g(z)  and  the  latter  follows  from  the  linearity  of  Lj.  .  We  also  showed  that  failures  to 
observe  substantial  performance  in  the  relative  JoR  task  (Klein,  Shiffrin,  &  Criss,  2007)  can  be 
accounted  for  by  the  choice  of  relative  and  absolute  delays  those  authors  used. 
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Figure  5.  Procedure  a  typical  temporal  mapping  experiment,  a.  In  the  first  phase  of  training,  one  group 
(top)  received  CS1  with  the  shock  US  presented  immediately  after  CS1  offset.  The  other  group  (bottom) 
received  the  US  five  seconds  after  offset  of  CS1.  b.  Both  groups  received  second-order  conditioning  between 
CS1  and  CS2.  c.  The  presumed  representation  after  both  phases  of  training  for  each  group  if  the  experiences 
were  aligned  on  CS1  onset.  In  the  second  group,  CS2  predicts  US  onset  more  robustly  than  in  the  first  group. 
After  Cole,  Barnet  &  Miller  (1995). 


Temporal  mapping.  Ralph  Miller  and  colleagues  have  done  a  series  of  elegant  conditioning 
experiments  that  suggest  two  fundamental  properties  of  learning  that  are  particularly  well-suited  for 
the  current  mathematical  framework.  First,  they  argue  that  the  temporal  relations  between  stimuli 
forms  an  essential  and  unavoidable  paid  of  the  learning  event.  Second,  they  argue  that  learners  can 
integrate  disparate  learning  events  into  a  coherent  temporal  map  by  aligning  different  time  lines  on 
a  common  stimulus. 

To  make  this  more  concrete,  let  us  describe  a  specific  experiment  (Cole  et  ah,  1995).  Rats 
were  trained  to  associate  a  5  s  CS1  with  a  US  (shock).  In  one  condition,  the  time  between  offset  of 
the  CS1  and  the  onset  of  the  US  was  0  s.  In  the  other  condition,  the  time  between  the  offset  of  CS1 
and  the  US  was  5  s  (Figure  5a).  Let  us  refer  to  these  as  the  0  s  and  5  s  conditions,  respectively.  After 
training  the  CS1-US  association,  a  second-order  association  was  formed  between  CS1  and  another 
5  s  CS2.  In  both  conditions,  the  onset  of  CS2  immediately  followed  the  offset  of  CS1  (Figure  5b). 
In  neither  condition  did  CS2  ever  cooccur  with  the  US.  The  first  finding  was,  not  surprisingly,  that 
the  CR  to  the  CS 1  was  stronger  in  the  0  s  condition  than  in  the  5  s  condition.  If  associative  strength 
is  a  scalar  value,  we  would  expect  the  second  order  conditioning  to  CS2  would  also  be  stronger 
in  the  0  s  condition  than  in  the  5  s  condition,  However,  exactly  the  opposite  was  observed.  This 
result  makes  no  sense  from  the  perspective  of  simple  associative  strength.  Miller’s  temporal  coding 
hypothesis  (Matzel,  Held,  &  Miller,  1988;  Savastano  &  Miller,  1998)  reconiles  these  findings  as 
follows.  Note  that  if  the  two  learning  episodes  were  aligned  on  the  CS1  (as  in  Figure  5c),  then  the 
CS2  does  not  predict  the  onset  of  the  US  in  the  0  s  condition.  In  the  5  s  condition,  CS2  strongly 
predicts  the  onset  of  the  US  when  the  two  learning  episodes  arc  aligned. 

A  model  must  have  two  basic  properties  in  order  to  account  for  this  phenomenon.  One  is 
that  the  temporal  relationships  between  stimuli,  rather  than  a  simple  scalar  associative  strength, 
is  learned.  Second,  some  mechanism  for  integrating  disparate  episodes  into  a  coherent  synthetic 
representation  is  necessary.  The  representation  of  temporal  history  offered  by  big  T  satisfies  the 
first  constraint.  The  ability  to  retrieve  temporal  contexts  satisfies  the  second  constraint  (see  also 
Howard  et  ah,  2005;  Rao  &  Howard,  2008).  We  showed  that  the  mathematical  properties  of  the 
model  enable  it  to  construct  satisfactory  temporal  maps  between  disparate  experiences  (Howard  et 
ah.  In  preparation). 
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